{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hackathon #3\n",
    "\n",
    "Written by Eleanor Quint\n",
    "\n",
    "Topics: \n",
    "- Saving and loading TensorFlow models\n",
    "- Running TensorFlow-based Python programs on Crane\n",
    "- Overfitting, regularization, and early stopping\n",
    "\n",
    "This is all setup in a IPython notebook so you can run any code you want to experiment with. Feel free to edit any cell, or add some to run your own code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# We'll start with our library imports...\n",
    "from __future__ import print_function\n",
    "\n",
    "import os  # to work with file paths\n",
    "\n",
    "import tensorflow as tf         # to specify and run computation graphs\n",
    "import numpy as np              # for numerical operations taking place outside of the TF graph\n",
    "import matplotlib.pyplot as plt # to draw plots\n",
    "\n",
    "mnist_dir = '/work/cse496dl/shared/hackathon/03/mnist/'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# extract our dataset, MNIST\n",
    "train_images = np.load(mnist_dir + 'mnist_train_images.npy')\n",
    "train_labels = np.load(mnist_dir + 'mnist_train_labels.npy')\n",
    "test_images = np.load(mnist_dir + 'mnist_test_images.npy')\n",
    "test_labels = np.load(mnist_dir + 'mnist_test_labels.npy')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Clear the graph\n",
    "tf.reset_default_graph()\n",
    "\n",
    "x = tf.placeholder(tf.float32, [None, 784], name='data_placeholder')\n",
    "# use a single name scope for the model\n",
    "with tf.name_scope('linear_model') as scope:\n",
    "    hidden = tf.layers.dense(x, 200, activation=tf.nn.relu, name='hidden_layer')\n",
    "    output = tf.layers.dense(hidden, 10, name='output_layer')\n",
    "    tf.identity(output, name='model_output')\n",
    "\n",
    "global_step_tensor = tf.get_variable('global_step', trainable=False, shape=[], initializer=tf.zeros_initializer)\n",
    "saver = tf.train.Saver()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All the preceding code is copied (with small modifications) from hackathon 2. We'll use it to illustrate saving and loading. Some notable modifications are the declaration of the 'global_step_tensor', the addition of the 'model_output' identity operation (which adds to the graph, even though we don't save the handle), and the addition of 'saver'.\n",
    "\n",
    "### Saving and Loading TensorFlow Models\n",
    "\n",
    "To save a model with initialized variables, we use the [save method](https://www.tensorflow.org/api_docs/python/tf/train/Saver#save) of an instance of [tf.train.Saver](https://www.tensorflow.org/api_docs/python/tf/train/Saver). Notice that this returns the checkpoint path prefix which may be passed directly to `Saver`'s load functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "save_directory = './hackathon3_logs'\n",
    "with tf.Session() as session:\n",
    "    session.run(tf.global_variables_initializer())\n",
    "    img, class_vec = session.run([x, output], {x: np.expand_dims(train_images[42], axis=0)})\n",
    "    print(class_vec)\n",
    "    imgplot = plt.imshow(img.reshape((28,28)))\n",
    "    \n",
    "    # the next lines save the graph and variables in save_directory \n",
    "    # as \"mnist_inference.ckpt.meta\" and \"mnist_inference.ckpt\"\n",
    "    path_prefix = saver.save(session, os.path.join(save_directory, \"mnist_classification\"), global_step=global_step_tensor)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we'll clear the graph and try to run a datum through the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Clear the graph\n",
    "tf.reset_default_graph()\n",
    "session = tf.Session()\n",
    "graph = session.graph\n",
    "# the following line fails because the placeholder tensor isn't in the graph anymore\n",
    "# session.run(output, {x: np.expand_dims(train_images[42], axis=0)})\n",
    "print(path_prefix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the last line is uncommented, we get what is essentially an \"operation not found\" error. Now let's load the graph structure we saved before with [tf.train.import_meta_graph](https://www.tensorflow.org/api_docs/python/tf/train/import_meta_graph) and then use [Saver.restore](https://www.tensorflow.org/api_docs/python/tf/train/Saver#restore) to load and initialize the variable values. We can get handles to the in-graph `Tensor`s with [Graph.get_tensor_by_name](https://www.tensorflow.org/api_docs/python/tf/Graph#get_tensor_by_name) and passing the name of the tensor (which is differentiated from the name of the operation by the \":0\", which denotes the 0th tensor output of the op). We can then run the operations as normal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# loading the meta graph re-creates the graph structure in the current session, and restore initializes saved variables\n",
    "saver = tf.train.import_meta_graph(path_prefix + '.meta')\n",
    "saver.restore(session, path_prefix)\n",
    "\n",
    "# get handles to graph Tensors, noticing the use of name scope in retrieving model_output\n",
    "x = graph.get_tensor_by_name('data_placeholder:0')\n",
    "output = graph.get_tensor_by_name('linear_model/model_output:0')\n",
    "print(graph.get_operations())\n",
    "\n",
    "img, class_vec = session.run([x, output], {x: np.expand_dims(train_images[42], axis=0)})\n",
    "print(class_vec)\n",
    "imgplot = plt.imshow(img.reshape((28,28)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Running TensorFlow-based Python programs on Crane\n",
    "\n",
    "Because these IPython notebooks are run on the Crane login node, we should not attempt to run more than a trivially sized program. This means that we are not allowed to run more than a few training steps of a small model. Larger jobs, like fully optimizing a model, must be submitted to Slurm, the job scheduling manager for the Crane node.\n",
    "\n",
    "We're now going to open a terminal from Jupyter to run the following commands:\n",
    "\n",
    "\n",
    "```\n",
    "cd $WORK\n",
    "cp /work/cse496dl/shared/hackathon/03/run_py_496dl.sh $WORK\n",
    "cp /work/cse496dl/shared/hackathon/03/basic.py $WORK\n",
    "```\n",
    "\n",
    "I've distributed a file called, \"run_py_496dl.sh\". It is most of what is needed to submit a Python program with TensorFlow installed and running on GPU. It expects a python file with a main function, and submits the job using `sbatch`:\n",
    "\n",
    "`sbatch ./run_py_496dl.sh basic.py`\n",
    "\n",
    "The way I have it written, it also passes through all arguments that follow the `.py`. Let's submit a job and then go over the details of the submit script. You can check on the status of your pending and running jobs with `squeue -u <USERNAME>`, substituting your Crane username, and you can cancel jobs with `scancel <JOBID>`, substituting the job id displayed by `squeue`. For more details, please visit the [HCC docs](https://hcc-docs.unl.edu/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Overfitting, regularization, and early stopping\n",
    "\n",
    "If we have enough parameters in our model, and little enough data, after a long period of training we begin to experience overfitting. Empirically, this is when the loss value of the data visible to the model in training drops significantly below the loss value of the data set aside for testing. It implies that the model is looking for patterns specific to the training data that won't generalize to future, unseen data. This is a problem.\n",
    "\n",
    "Solutions? Here are some first steps to think about:\n",
    "\n",
    "1. Get more data\n",
    "2. Reduce the number of parameters in the model\n",
    "3. Regularize the weight/bias parameters of the model\n",
    "4. Regularize using dropout\n",
    "5. Early Stopping\n",
    "\n",
    "Let's re-specify the network with regularization from [dropout](https://www.tensorflow.org/api_docs/python/tf/layers/dropout). Other common regularizers can be found in [tf.contrib.layers](https://www.tensorflow.org/api_docs/python/tf/contrib/layers) as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "tf.reset_default_graph()\n",
    "\n",
    "KEEP_PROB = 0.7\n",
    "x = tf.placeholder(tf.float32, [None, 784], name='input_placeholder')\n",
    "with tf.name_scope('linear_model') as scope:\n",
    "    # as strange as it sounds, using dropout on the input sometimes helps\n",
    "    dropped_input = tf.layers.dropout(x, KEEP_PROB)\n",
    "    hidden = tf.layers.dense(dropped_input,\n",
    "                             400,\n",
    "                             activation=tf.nn.relu,\n",
    "                             name='hidden_layer')\n",
    "    dropped_hidden = tf.layers.dropout(hidden, KEEP_PROB)\n",
    "    output = tf.layers.dense(dropped_hidden,\n",
    "                             10,\n",
    "                             name='output_layer')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or, alternatively, a network using [tf.contrib.layers.l2_regularizer](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/l2_regularizer)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "tf.reset_default_graph()\n",
    "\n",
    "x = tf.placeholder(tf.float32, [None, 784], name='input_placeholder')\n",
    "with tf.name_scope('linear_model') as scope:\n",
    "    hidden = tf.layers.dense(x,\n",
    "                             400,\n",
    "                             kernel_regularizer=tf.contrib.layers.l2_regularizer(scale=0.01),\n",
    "                             bias_regularizer=tf.contrib.layers.l2_regularizer(scale=0.01),\n",
    "                             activation=tf.nn.relu,\n",
    "                             name='hidden_layer')\n",
    "    output = tf.layers.dense(hidden,\n",
    "                             10,\n",
    "                             kernel_regularizer=tf.contrib.layers.l2_regularizer(scale=0.01),\n",
    "                             bias_regularizer=tf.contrib.layers.l2_regularizer(scale=0.01),\n",
    "                             name='output_layer')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When using any amount of numerical regularization, it is important to add the values to the final loss function that is used in `minimize`, otherwise the regularizers do nothing. Built-in regularizers are automatically added to a list that can be retrieved with [tf.get_collection](https://www.tensorflow.org/api_docs/python/tf/get_collection) which takes a [GraphKey](https://www.tensorflow.org/api_docs/python/tf/GraphKeys) and returns a list of tensors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define classification loss WITH regularization loss\n",
    "# In our case, it's L2, but could also commonly be L0, L1, or Linf\n",
    "y = tf.placeholder(tf.float32, [None, 10], name='label')\n",
    "cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y, logits=output)\n",
    "\n",
    "regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)\n",
    "print(regularization_losses)\n",
    "# this is the weight of the regularization part of the final loss\n",
    "REG_COEFF = 0.001\n",
    "# this value is what we'll pass to `minimize`\n",
    "if regularization_losses:\n",
    "    total_loss = cross_entropy + REG_COEFF * sum(regularization_losses)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, the most important tool to combat overfitting is early stopping. This is the practice of saving copies of the parameters periodically and, after you've recognized that over-fitting is occurring (i.e., when the loss on the validation/test data doesn't decrease for a number of epochs), stop training and report the best saved copy of the parameters rather than the overfit version.\n",
    "\n",
    "Whether deciding which set of parameters to use from training or trying to decide which form of regularization will work best by trying different kinds, you can get a better idea of how your regularization decisions will affect true generalization by further splitting the training data into training and validation sets. The validation data is left unused in adjusting model parameters, but is still used in training to make regularization decisions, and this leaves the test data to be the true measure of generalization.\n",
    "\n",
    "You might use a function like the following for the task of splitting your data into two numpy arrays."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def split_data(data, proportion):\n",
    "    \"\"\"\n",
    "    Split a numpy array into two parts of `proportion` and `1 - proportion`\n",
    "    \n",
    "    Args:\n",
    "        - data: numpy array, to be split along the first axis\n",
    "        - proportion: a float less than 1\n",
    "    \"\"\"\n",
    "    size = data.shape[0]\n",
    "    split_idx = int(proportion * size)\n",
    "    np.shuffle(data)\n",
    "    return data[:split_idx], data[split_idx:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hackathon 3 Exercise 1\n",
    "\n",
    "Modify the `basic.py` file to add L2 regularization, split the training data to get a validation set, calculate loss on the validation (similar to test), and add early stopping. Train the model on Crane, submitting the job with `sbatch` and report the train, validation, and test loss values of the best set of parameters, along with the training epoch number they were saved from. (This is very similar to what you need to do for the first homework)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "### 1) MODIFY THE CODE, 2) TRAIN ON CRANE, 3) FILL THESE IN 4) SUBMIT THIS .IPYNB\n",
    "# EPOCH: \n",
    "# TRAIN LOSS: \n",
    "# VALIDATION LOSS:\n",
    "# TEST LOSS:"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "TensorFlow 1.12 (py36)",
   "language": "python",
   "name": "tensorflow-1.12-py36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
